Summarizing Search Results using PLSI

نویسندگان

  • Jun Harashima
  • Sadao Kurohashi
چکیده

In this paper, we investigate generating a set of query-focused summaries from search results. Since there may be many topics related to a given query in the search results, in order to summarize these results, they should first be classified into topics, and then each topic should be summarized individually. In this summarization process, two types of redundancies need to be reduced. First, each topic summary should not contain any redundancy (we refer to this problem as redundancy within a summary). Second, a topic summary should not be similar to any other topic summary (we refer to this problem as redundancy between summaries). In this paper, we focus on the document clustering process and the reduction of redundancy between summaries in the summarization process. We also propose a method using PLSI to summarize search results. Evaluation results confirm that our method performs well in classifying search results and reducing the redundancy between summaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity Search for Multi-dimensional NMR-Spectra of Natural Products

Searching and mining nuclear magnetic resonance (NMR)spectra of naturally occurring products is an important task to investigate new potentially useful chemical compounds. We develop a set-based similarity function, which, however, does not sufficiently capture more abstract aspects of similarity. NMR-spectra are like documents, but consists of continuous multi-dimensional points instead of wor...

متن کامل

An Evaluation of Text Retrieval Methods for Similarity Search of multi-dimensional NMR-Spectra

Searching and mining nuclear magnetic resonance (NMR)spectra of naturally occurring substances is an important task to investigate new potentially useful chemical compounds. Multi-dimensional NMR-spectra are relational objects like documents, but consists of continuous multi-dimensional points called peaks instead of words. We develop several mappings from continuous NMR-spectra to discrete tex...

متن کامل

PLSI Utilization for Automatic Thesaurus Construction

When acquiring synonyms from large corpora, it is important to deal not only with such surface information as the context of the words but also their latent semantics. This paper describes how to utilize a latent semantic model PLSI to acquire synonyms automatically from large corpora. PLSI has been shown to achieve a better performance than conventional methods such as tf·idf and LSI, making i...

متن کامل

Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence Chi-Square Statistic, and a Hybrid Method

Non-negative Matrix Factorization (NMF) and Probabilistic Latent Semantic Indexing (PLSI) have been successfully applied to document clustering recently. In this paper, we show that PLSI and NMF optimize the same objective function, although PLSI and NMF are different algorithms as verified by experiments. This provides a theoretical basis for a new hybrid method that runs PLSI and NMF alternat...

متن کامل

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

NMF and PLSI are two state-of-the-art unsupervised learning models in data mining, and both are widely used in many applications. References have shown the equivalence between NMF and PLSI under some conditions. However, a new issue arises here: why can they result in different solutions since they are equivalent? or in other words, their algorithm differences are not studied intensively yet. I...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010